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Abstract. The use of Cepheids as distance indicators on 
Galactic and extragalactic distance scales is based upon the 
Cepheid period - luminosity (PL) and period - luminosity - 
colour (PLC) relations. These relations are usually derived in 
terms of the properties of Cepheids at mean light - i.e. av- 
eraged over their pulsation cycle. In this paper, we derive a 
physical argument for the existence of PL and PLC relations 
at maximum light. We examine in detail a sample of Cepheids 
in the Large Magellanic Cloud, and compare the variance of 
some PL and PLC type distance indicators based on mean and 
maximum light. 

We show that for the LMC data considered, a PLC relation 
based on maximum light leads to a distance estimator with a 
dispersion about 10% smaller than its counterpart using mean 
light. We also show that for the LMC, a PLC type relation con- 
structed using observations at both maximum and mean light 
has a significantly (> 50%) smaller dispersion than a PLC 
relation using either maximum or mean light alone. A compa- 
rable (> 30%) reduction in the dispersion of the corresponding 
distance estimator, however, in this case requires the relation 
be applied to a large (n > 30) group of equidistant Cepheids 
in, e.g., a distant galaxy. Recent HST observations of IC4182, 
M81 and MlOO already provide suitable candidate data sets for 
this relation. The use of maximum light in constructing PLC 
type relations for galactic and extragalactic Cepheids is, there- 
fore, shown to be an interesting topic for further study. These 
investigations are under way. 

Key words: Stars:oscillations, Stars:fundamental parameters, 
Cosmology:distance scale 



1. Introduction 

Cepheids are high luminosity radially pulsating variable stars. 
Their intrinsic brightness ranges from — 2 > Mv > —6 and 
makes them suitable candidates for distance indicators on 
Galactic and extragalactic distance scales. This is based on 
the Cepheid period - luminosity (PL) and period - luminosity 
- colour (PLC) relations. Examples of such PL and PLC rela- 
tions for the Magellanic Clouds are given in Figure 4 of Madore 
and Freedman (1991) and in Figure 3 of Caldwell and Coul- 
son (1986; hereafter CC) respectively. In these relations the 
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luminosity, measured by the magnitude, and colour are mean 
quantities taken over the pulsational cycle. Motivated by the 
work of Simon, Kanbur and Mihalas (1993; hereafter SKM), in 
this paper we derive a physical argument for the existence of 
PL and PLC relations for Cepheids based on their properties 
at maximum light. This argument thus provides a physical jus- 
tification for the work of Sandage and Tammann (1968), who 
introduced a PL relation at maximum light. However, we have 
extended their work by the introduction of a colour term and 
the simultaneous use of maximum and mean light. 

We then examine in detail a sample of Cepheids in the 
Large Magellanic Cloud (LMC), using multicolour photomet- 
ric data originally presented in Martin and Warren (1979) and 
discussed in Martin, Warren and Feast (1979; hereafter MWF). 
These data have (B-V) colours and were used purely as an il- 
lustration since this was the only data available to us at the 
time. The principles of our analysis are applicable in any wave- 
length range, however, and the extension of our analysis to 
other data sets will form part of our future work. Following 
the statistical formalism described in Hendry and Simmons 
(1990, 1994; hereafter HS90, HS94), we obtain 'optimal' (in 
the sense of unbiased and minimum variance) distance esti- 
mators corresponding to several different PL and PLC-type 
relations derived from this calibrating sample. We show that 
distance estimators based on the properties of the Cepheid at 
maximum light can have significantly smaller variance than 
those derived for Cepheids at mean light. 

The paper is organised as follows: Sect. 2 describes the the- 
oretical basis for the PL and PLC relations, originally intro- 
duced by Sandage (1958). Sect. 3 explains the physical reason- 
ing behind our reformulation and extension of these relations 
in terms of Cepheid maximum light and colour. Sect. 4 out- 
lines briefly how one may use our new relations to derive the 
corresponding Cepheid distance indicators. In the appendix 
we discuss the statistical model with which we derive these 
distance indicators, and describe the significance tests which 
we used to compare the variance of distance estimators de- 
fined for Cepheids at mean and maximum light. In Sect. 5 we 
describe the LMC Cepheid data upon which our analysis is 
based, and test the validity of the assumptions made in the 
appendix concerning the statistical properties of this sample. 
Sect. 6 presents our results and discussions. In Sect. 7 and 8 
we report our conclusions and point out further work in this 
area. 
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2. Cepheid relations at mean light 

The theoretical basis for the empirical Cepheid PL and PLC 

relations was first outlined by Sandage (1958) as a consequence 
of the period mean density relation for pulsating variable stars, 
the Stefan Boltzmann law and the existence of a linear mass 
luminosity relation for Cepheids. We outline this argument be- 
low. 

The period mean density relation states that 
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where P is the Cepheid period, Q is a slowly varying function 
of stellar parameters and p, the mean density, satisfies 



pcxMTZ 



(2) 



where A4 is the total mass and TZ is the radius of the star. The 
Stefan Boltzmann law, however, states that 



(3) 



where L and TZ^q are the equilibrium luminosity and radius 
respectively, and Te is the effective temperature. For Cepheids, 
we assume that the equlibrium L and TZe<i are close to their 
average values over a pulsational cycle. It then follows that 



7e^/^ oc L 
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Substituting equations (2) and (4) into equation (1), the 
period mean density relation, and taking logarithms we obtain 

logP+^logA1-^logL + 31ogTe=logQ (5) 

If we assume that there exists a power law mass-luminosity 
relation for Cepheids, 



logL = alogM + P 



(6) 



where a and /? are constants, then equation (5) may be reduced 
to 



logP + - |) logL + SlogT, = log Q + A 



Equation (7) is the theoretical basis for the Cepheid PLC 
relation, and its projection onto the plane of mean magnitude 
and period gives the PL relation - examples of which are given 
in Figure 4 of Madore and Freedman (1991). It can be seen 
that there is a scatter about the regression line of mean mag- 
nitude on period. That is, for a given period there is a range of 
mean magnitudes. This dispersion results from a combination 
of several different factors - including the effect of reddening, 
which may not be properly corrected, and observational errors 
in the apparent visual magnitudes. Another factor contribut- 
ing to the dispersion is intrinsic, however, and is caused by 
the finite width of the instability strip in the HR diagram. A 
Cepheid can be brighter than that luminosity given by the re- 
gression line by having a hotter surface temperature than a 
Cepheid of the same period but with a mean magnitude given 
exactly by the regression line. Similarly, a Cepheid of given 
period can be dimmer than the regression line by having a 
cooler surface temperature. This is completely in accord with 



equation (7). Hence the possible range of Cepheid surface tem- 
peratures (ie. the range of To in equation (7) at a given period), 
as well as the other factors mentioned above, leads to the scat- 
ter in the Cepheid PL relation (Sandage 1958). The possible 
range of Cepheid surface temperatures is determined by the 
width of the instability strip - the locus of L and T points in 
the HR diagram in which pulsation occurs. It is also clear that 
a smaller range of Tc in equation (7) will lead to a tighter cor- 
relation between P, logL, and logT. We discuss instances in 
which this is the case in the next section. 

There has been considerable debate in the literature over 
the exact cause of the scatter in the Cepheid PL relation. Some 
authors claim that observational errors and an incorrect al- 
lowance for reddening combined with a narrow, but non zero, 
intrinsic width of the instability strip mean that it is difficult to 
disentangle the effects of reddening and intrinsic temperature 
variations (Madore and Preedman 1991; Clube and Dawe 1980; 
Stift 1982, 1990). Others maintain that observations and esti- 
mates of reddening and colour excess are accurate enough to 
imply that the scatter in the PL relation is due to intrinsic tem- 
perature variations (MWF; CC; Feast and Walker 1987). The 
arguments presented in these papers and the work of Laney 
and Stobie (1986) who showed the existence of a significant 
colour term in the infrared PLC relation for LMC Cepheids 
provide ample support for this latter view. 



3. The Cepheid at Maximum light 



At maximum light, the period mean density law is still valid. 
We assume, moreover, that the Stefan Boltzmann law still ap- 
plies. 



oc TZi 



(8) 



where Lmax is the maximum luminosity, 7?.max is the radius at 
light maximum and T^ax is the photospheric temperature at 
maximum light. In the optical and ultraviolet most of the light 

variations in Cepheids are caused by temperature fluctuations, 
not radius variations (c.f. Cox 1974, McGonegal et al 1982) and 
in any case maximum light occurs when the star is expanding 
through its equilibrium radius (Cox 1974; SKM), so we can 
also assume, 



(7) 7^„,„ ^ 7^ 



(9) 



where 7i is the equilibrium radius of the star. However, even at 
longer wavelengths, the relative radius fluctuations are no more 
than five to ten percent (Cox 1974) and equation (9) will still 
be approximately valid. Thus our physical derivation would 
still be applicable in the infrared. Therefore, using equation 
(8) and (9) we obtain. 
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(10) 



Substituting equation (10) into the period mean density 
law, taking logarithms and assuming equation (6), yields 



logP+^(logL-/3)-^logL„ 



-F3l0gT„ax =logQ (11) 



Equation (11) suggests the existence of a relation between 
period, mean luminosity, maximum luminosity and maximum 
temperature. The appearance of the Tmax term in equation (11) 
is important because at maximum light, the range of Cepheid 
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photospheric temperatures is smaller (about 600 K) than at 
mean light (about 1000 K), as was shown in SKM. This is 
because at maximum light the photosphere occurs at the base 
of the hydrogen ionization zone, independent of the period of 
the Cepheid. For typical Cepheid densities and temperatures, 
the hydrogen ionization zone occurs at about 6200 K, although 
very long period Cepheids (P > 40 days) have such extended 
envelopes that the photosphere is further out in the envelope. 
Consequently these longer period Cepheids have at maximum 
light a range of photospheric temperatures similar to that at 
mean light (SKM). If one considers only Cepheids of shorter 
period {P < 40 days), however, then one would expect the 
range of T„i,x in equation (11) to be smaller than the range 
of T in equation (7). Consequently one can expect that PL 
and PLC relations based on maximum light may have smaller 
scatter than those based on mean light. 

Another advantage of this approach is the following. The 
Stefan Boltzmann law applied at mean and maximum light 
yields, 



-3/4 ^3 



which implies. 
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--logL-FSlogTe 



■ log L„„ + 3 log r„ 
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Substituting for log Te from equation (7) into the above equar 
tion yields equation (11). We are constructing a relationship 
using the properties of the pulsation at two phase points (mean 
and maximum) to obtain equation (11), rather than using just 
the properties at mean light as in equation (7). Hence Cepheid 
relations based on equation (11) incorporate more information 
about the pulsation than their counterparts based on equation 
(7). 

We can rewrite equation (11) in the following form: 



logP + ^(logL-logL„a. + logL„ax-/3)-^logL„ 



+ 31ogT„ 
which implies. 



logs 



(14) 



logP + ^(logL-logL„ax) + (2^-^)logL„ 
+ 31ogT„ax = logS 



Equation (15) suggests the possibility of a period, maxim- 
ium light, semi-amplitude relationship. 

We can convert equation (11) into a form which can be 
more easily compared with observations by writing. 



logL = 0.4(c-Mboi) 
and 



(16) 



BC = a + blog(B-V)o 
and 

logT ^ X + y{B - V)„ 



(19) 



(20) 



where a, b, x and y are constants and {B — V)o is the dereddened 
colour. 

Combining equations (15) - (20) we have 



log P - + 0.3M.„ax + (^6 + 0.36 + iy){B - V)o 

= constant 



(21) 



Clearly one can also convert the period, maximum light, 
semi-amplitude relation suggested by equation (15) into a sim- 
ilar form involving magnitudes and colours. Much as equation 
(7) is the theoretical justification for the Cepheid PLC and PL 
relations at mean light (Sandage 1958; Sandage and Tammann 
1968), equation (11) is the theoretical justification for Cepheid 
PLC and PL relations at maximum light. Although Sandage 
and Tammann (1968) constructed a period maximum light re- 
lation for Cepheids in the Large and Small Magellanic Clouds, 
M31 and NGC 6822, to our knowledge the above argument has 
never been given. 

4. Cepheids as distance indicators 

As we indicated in Sect. 1, our main motivation in studying 
and extending PL and PLC relations for Cepheids is their use- 
fulness as primary distance indicators. In this section we illus- 
trate how one would apply our new relations, calibrated with a 
sample of Cepheids of known distance, to infer the distance of 
other Cepheids grouped in a more distant galaxy or cluster. We 
follow the statistical formalism and notation adopted in HS90 
and HS94 in discussing optimal galaxy distance indicators such 
as the TuUy-Fisher or _D„ — a relations. In particular, we adopt 
the standard statistical convention of denoting an estimator of 
a quantity by a caret. In order to avoid a surfeit of confusing 
subscripts we also drop the subscript 'v' from the absolute and 
apparent visual magnitude. 

The basic relationship between the absolute magnitude, M, 
and apparent magnitude, m, (assumed corrected for absorp- 
tion) for an object at distance D Mpc is 



(15) m = M-F51og£)-|-25 



(22) 



For Cepheids, these magnitudes are usually taken to be the 
mean value over their pulsation cycle. Clearly, however, for any 
Cepheid we also have 



= M„ 



5 log £) + 25 



(23) 



An obvious estimate of the distance, D, (or more correctly 
the log distance) of the Cepheid is then 



log L„ax = 0.4 (C - Mbol)„ 



(17) log D = 0.2(m - M - 25) 



(24) 



where Mboi denotes the absolute bolometric magnitude, which 
is related to the absolute visual magnitude, Mv, by 



= Mboi+BC 



(18) 



We can convert the bolometric correction, BC, and the tem- 
perature to an observed colour using relations of the form. 



log D = 0.2(m„ax - M„ax - 25) 



(25) 



Here M and Mmax denote estimators of the mean and mgix- 
imum absolute magnitude respectively. Examples of such esti- 
mators include 
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M = ai + bi log P 



M = tti + bilogP + C2{B - V)o 



(26) 



(27) 



where P and {B — V)o take their usual meanings and ai, 6i, 
Ci, 02 and 62 are constants. The motivation for considering 
estimators of this form is clearly the PL and PLC relations - 
the theoretical basis for which has already been described in 
the preceding sections. 

The constant coefficients in equations (26) and (27) are 
obtained by fitting such relations to the relevant observations 
- i.e. to the apparent visual magnitude, period and colour - 
for Ccplicids of known distance. For example, if the calibrating 
Cepheids arc all at distance, Dcai Mpc, then we may rewrite 
equation (26) as 



m — 5 log Deal — 25 = oi + 61 log P 



(28) 



In other words the estimator of absolute visual magnitude 
given by equation (26) is equivalent to the estimator of appar- 
ent visual magnitude given by equation (28). That is, in order 
to best fit the intrinsic relations involving mean and maximum 
absolute magnitudes, wo need only consider the corresponding 
relations involving apparent magnitudes, provided all the cal- 
ibrating stars are assumed equidistant. Similar remarks apply 
to all the PLC-type relations discussed in this paper. 

The fitting procedure mentioned above is usually a max- 
imum likelihood or linear regression analysis, and requires a 
statistical model for the relationship between the relevant vari- 
ables. A suitable model for the PL relation, for example, might 
be 



M = a + blogP + e 



(29) 



where the errors, e, are taken to be normally distributed with 
mean and variance . Given such an error model, the es- 
timator, M, given by equation (26) with ai = a and bi = b 
corresponds to the expected absolute magnitude conditional 
upon log period, and the values of a and b are in this case 
equal to the slope and zero point respectively of the direct 
linear regression of M on log P. An example of a more sophis- 
ticated model for the PLC relation is given in the appendix of 
CC. In the appendix of this paper we describe the statistical 
model adopted in the present analysis. 

Given an appropriate statistical model it is straightforward 
to determine the distribution, p(log D\Dt), of log D conditional 
upon the true Cepheid distance, Dt- One can then use the mo- 
ments of this distribution as a suitable criterion for comparing 
the properties of different estimators. For example, a desirable 
property is that log D be unbiased. This requires that log D 
satisfies the relation 



E{\ogD\DT:) = JlogDp{logD\DT:)d\ogD 

= lOg^T 



(30) 



(c.f. eqn. [4] of HS94). 

In other words log D will, on average, yield the true log 
distance of the Cepheid, whatever that true distance is. In a 
similar manner the risk, TZ, of an estimator is defined as the 
expected value of the second moment of p(logZ)|i)T), i.e. 



7^ = / (logD - logL»T) p(logD|DT)dlogD 



(31) 



(c.f. eqn. [5] of HS94). Note that for an unbiased estimator the 
risk is identically equal to the variance. In this study we will 
consider as optimal log distance estimators which are unbiased 
and which have minimum risk or variance - a criterion which 
we discuss more fully in the appendix. 

Generally, better physics should translate into bettor statis- 
tics. For example, equation (27) reflects a more complete phys- 
ical model for M than that given by equation (26), and we 
would therefore expect that the distance estimator of equation 
(24) would have a smaller variance if M is given by equation 
(27) rather than equation (26). In fact, previous studies such 
as MWF and CC have indeed shown that for LMC Cepheids, 
the introduction of a colour term offers a significant reduction 
in scatter over a PL relation. 

In this study we investigate the variance of log distance 
estimators taking the form of equations (24) and (25), where 
our models for M and M„ax are prompted by equations (11) 
and (15) respectively. 

It follows trivially from equations (22) - (25) that for an 
unbiased estimator TZ is given by 



7^ = 0.04£;(M. - M f 



(32) 



where an asterisk denotes 'mean' or 'maximum' magnitude as 
appropriate. The rms percentage distance error, A, of logD is 
given by 



A ~ 201nlO<T, 



(33) 



Specifically, in this paper we examine the following estima- 
tors of mean or maximum absolute magnitude. 



M = a + blogP 



M = a + blogP + c{B-V) 



a + b\ogP + c{B -V)„ 



M„ax = a + b\ogP + c{B- l/)„ax + d{M^ 
Mmax = a + blogP + c{B- y)„ax + dM„e 



(34) 
(35) 
(36) 
(37) 
(38) 
(39) 



Of course the constants o, 6, c and d are different in each 
case. Again, to avoid a surfeit of subscripts we have dropped 
the subscript '0' from {B — V) and {B — V)„ax. Henceforth, 
unless we state otherwise, all colours will be assumed to be 
corrected for reddening. Equations (34) and (36) are the stan- 
dard Cepheid PL and PLC relations at mean light. Equation 
(35) is a Cepheid PL (max) relation, as described in Sandage 
(1968). Equations (37), (38) and (39) are new and are based 
upon equations (11) and (15). 

Note that the semi-amplitude term on the right hand side 
of equation (38) involves the true value of both Mmean and 
M„ax, neither of which is directly observable (otherwise we 
would have no need to estimate them!). This presents no diffi- 
culty, since M„ean — M^ax may be re- written as m„ean — m„ax, 
i.e. the difference between the mean and maximum apparent 
magnitude, a quantity which is readily observable. Thus, 
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M„.ax = o + 6 log P + c(B - "l/)„ax + d(m„,a„ - m„ax) (40) 

The presence of M„ean on the right hand side of equation 
(39), however, cannot be dealt with in this way. In order to 

overcome the obvious problem that Af„,e„„ is not directly ob- 
servable, and in addition to avoid fitting a relation which is 
distance degenerate, we replace equation (39) by 

Mlax = a + 6l0gP + c(B-l/)„ax + d(M„ean- < M„ean >)(41) 

where < M„,oa„ > denotes the sample averaged absolute mag- 
nitude at mean light of a group of equidistant Cepheids (e.g. in 
a distant galaxy or cluster, the distance of which we wish to es- 
timate). Of course the zero point, a, of equation (41) will differ 
from that in equation (39). Note that Mmean— < M„e,an > is a 
distance independent quantity (as is M„ean) but is directly ob- 
servable, being equal simply to m„ean— < ?Ti„oan >. Note also, 
however, that the distribution of < M„ean > depends upon the 
sample size of Cepheids, and in particular, that one could not 
sensibly apply equation (41) to estimate the maximum abso- 
lute magnitude of an individual field Cepheid. 

Some algebra easily establishes that the variance of M„ax 
is given by the sum of the variance of Mmax in equation (39) 
and the variance of < Mmean >, i.e. 

where (Tm is the dispersion of the intrinsic distribution of mag- 
nitudes at mean light (which of course is not known a priori, 
but which is estimated from the LMC calibrating Cepheids) 
and n is the number of observed Cepheids in the more dis- 
tant galaxy or cluster. The risk, TZ, of the corresponding log 
distance estimator is given by 7?. = O.OAa^., , in accordance 
with equation (32) above. 

Finally in this section we should note that the above analy- 
sis takes no account of the metallicity dependence of our fitted 
relations. Although the Cepheid PL relations (34) and (35) 
are insensitive to composition differences, this will not be the 
case for the remaining PLC-type relations. One can overcome 
this problem as follows. One fits equations (36) - (39) to the 
LMC data and then adjusts the coefficients of the fit (using 
eg. Table (bl) in CC) to give the PLC relation of a normal 
metal abundance Cepheid. The zero point of this corrected re- 
lation can then be then calibrated using Galactic Cepheids of 
known distance. CC use the results of Iben and Tuggle (1975), 
Becker, Iben and Tuggle (1977) and BeU and Gustafsson (1978) 
to obtain their Table (bl). The composition dependendence 
of the Cepheid mass luminosity law is part of the reason for 
the metallicity sensitivity of the PLC. It is not known exactly 
how this will affect e.g. the PLC(max) relation.. Since Feast 
and Walker (1987) suggest that the metal deficiency of LMC 
Cepheids is slight, however, (1.4 times less than solar), it seems 
likely that the difference between the values of b and c in equa- 
tion (37) and those appropriate for a metal normal Cepheid 
will be small. Because our primary intention in this paper is 
to establish how the use of Cepheids at mean and maximum 
light affects the dispersion of our distance estimators, we leave 
a more precise zero point calibration of relations (36) to (39) 
- accounting for metallicity dependence as described above - 
for subsequent papers. 



5. LMC Data 

In order to test our assertion of the existence of Cepheid rela- 
tions of the form suggested by equations (7), (11) and (15), we 
use multicolour photoelectric observations of LMC Cepheids 
taken by Martin and Warren (1979) and discussed in MWF. 
These observations were taken in the BVI system. The list of 
stars used in this analysis is given in Table 1, together with 
their period in days. Note that some of the stars in the orig- 
inal data set presented in Martin, Warren and Feast (1979) 
were omitted because there were not enough observed points 
to obtain accurate mean and maximum magnitudes. These ex- 
cluded stars are shown in Table 2. Rejecting these stars resulted 
in the total sample of 39 stars shown in Table 1. MWF and 
Feast (1984) have discussed the small dispersion in reddenings 
for their LMC Cepheids: hence we adopt a constant value of 
E{B -V) = 0.1 given by Madore and Freedman (1991). We 
take R = Av/E(^b-v) to be 3.3, again following MWF. Since 
the only data set available to us was in magnitudes and not 
intensity fluxes, our PL and PLC relations are determined in 
terms of magnitudes. 

Figure 1 shows a plot of period vs. semi-amplitude for all 
39 stars in our sample. It clearly shows a group of stars at long 
period which are well separated from the other stars in the sam- 
ple. Rather than postulate why this is so, we will present results 
with and without this set of stars. These stars are presented in 
Table 3, marked by an asterix in the third column. Recall that 
in Sect. 3 and in SKM it was suggested that at maximum light 
Cepheids have a smaller range of surface temperatures than at 
mean light, but that for periods P > 40 days this distinction 
is not found. Since the range of surface temperatures also in- 
troduces scatter, we will consider a third subset of the data 
consisting only of those stars with periods P < 40 days. Stars 
with periods P > 40 days axe denoted by a in Table 3. 
Moreover, note that the disjoint group of stars listed in Table 
3 all have periods P > 40, and so arc also rejected from this 
third sample. We therefore consider three data sets, with 39, 
35 and 31 stars respectively. In the appendix, we introduce a 
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Fig. 1. log period plotted versus semi-amplitude for the 39 LMC 
Cepheids in our calibrating sample. 

multivariate normal model to describe the joint distribution of 
the six relevant variables in the Cepheid data: maximum mag- 
nitude, mean magnitude, maximum colour, mean colour, log 
period and semi-amplitude. We require this joint distribution 
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to be normal in order to make a strictly valid application in 

Sect. 6 of our hypothesis tests that the variance of distance es- 
timators based on maximum light is significantly smaller than 
for those based upon mean light. 

We can test the normality of the distribution of each of 
these variables in several ways. First, we plot a sample cumu- 
lative distribution function (CDF) and compare this with the 
theoretical CDF of a normal distribution with mean and dis- 
persion equal to the sample mean and dispersion of that vari- 
able. Figures 2, 3 and 4 show selected results obtained for the 
samples of 31, 35 and 39 stars respectively. The agreement with 
the theoretical normal curves is generally quite good, although 
is somewhat worse for the distribution of semi-amplitude than 
for the other variables. To quantify this agreement we ap- 
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Sample CDF of Vmean : 31 stars 
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Fig. 2. Comparison of sample CDF curve witii the normal curve of 
equal mean and dispersion. The examples shown are for mean and 
maximum apparent magnitude, for the subsample of 31 stars. 

ply the Kolmogorov-Smirnov (KS) test (c.f. Kendall & Stuart, 
1963), for which the test statistic, D„, is defined as the maxi- 
mum absolute deviation between the sample and model CDF 
curves. Table 4 lists -Dobs, the observed value of the test statis- 
tic, for each of the six relevant observables and using each of 
the three data sets under consideration, together with the cor- 
responding significance of the KS test - i.e. the probability that 
7?„ > -Dobs under the rmll hypothesis that the sample CDF is 
drawn from the modelled theoretical distribution. 

It is clear from these results that, on the basis of the KS 
test, there is no strong evidence to reject the hypothesis that 
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Fig. 3. Comparison of sample CDF curve with the normal curve of 
equal mean and dispersion. The examples shown are for mean and 
maximum colour, for the subsample of 35 stars. 

the six variables are normally distributed - although it is also 
clear that the hypothesis is much more strongly accepted for 
the restricted data set of 31 stars than for the full sample. 

The main advantage of the KS test is its robustness, how- 
ever, and it would seem prudent to apply more powerful tests 
of normality to confirm the validity of our assumptions. We 
next calculated the sample skewness, <S, and kurtosis, /C, for 
each variable, x, defined as 

'^=^E(^)' (43) 

3 = 1 

and 

1 _- 4 

^ = iE(^) -3 (44) 
j=i 

where x and a denote the sample mean and standard deviation 
of the sample of size, N, respectively. 

The intrinsic skewness and kurtosis should both be indenti- 
cally zero for a normal distribution. The sampled value of each 
statistic will fluctuate around zero, however, and the variance 
of the sampling distribution depends upon the sample size, N. 
Under the null hypothesis of a normal variable, the variance 
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Sample CDF of logP : 39 stars 




Fig. 4. Comparison of sample CDF curve with the normal curve of 
equal mean and dispersion. The examples shown are for log period 
and semi-amplitude, for the full sample of 39 stars. 



of the sample skewness and kurtosis is approximately equal to 
15/iV and 'dG/N respectively (c.f. Kendall and Stuart, 1963). 

Table 5 lists the sample skewness and kurtosis, expressed 
in terms of the number of standard deviations under the null 
hypothesis of a normal distribution, for each of our six Cepheid 
observables and for the three different sample sizes considered. 
Prom these results we see that the sample of 31 stars shows 
no significant evidence of non-zero skewness and kurtosis - the 
largest sampled skewness being ^ 0.72ct for mean apparent 
magnitude. In the larger samples, however, there is some indi- 
cation of significant skewness above the one a level for mean 
magnitude and maximum colour. 

The final test which we apply is based upon Shapiro and 
Wilks' W statistic (Royston, 1982). This is a considerably more 
powerful test and uses the properties of the order statistics of a 
normal distribution - i.e. the properties of a sample arranged 
in ascending or descending order, (c.f. David, 1981; Hendry, 
O'Dell and Collier Cameron, 1993). The test statistic is given 
by 



W = 



(45) 



where x denotes the sample mean and the set of normalising 
weights, ttj, depend upon the sample size, N. 



The mean value of the statistic under the null hypothesis 
of a sample drawn from a normal distribution is unity. 

Table 6 shows the values of the W statistic and the calcu- 
lated significance of the test in each relevant case. From these 
results we see that our sample of 31 stars satisfies well the as- 
sumption of normality: only for the observed semi-amplitudes 
is there any evidence, at the ^ 5% level, for significant devi- 
ation from a normal distribution. In the larger samples, how- 
ever, the validity of the normal assumption is more marginal. 
In fact for our full sample of 39 stars only for the mean colour 
would the null hypothesis be accepted at the 15% level, and for 
the maximum colour it would be quite strongly rejected, at a 
level of 0.01%. Clearly, then, we can safely apply hypothesis 
tests based on normality to the restricted sample of 31 stars, 
but must be somewhat more cautious in drawing conclusions 
from their application to the larger samples. We will comment 
further upon this point in Sect. 6 and Sect. 7. 

Finally, it is important to note that - in modelling the joint 
distribution of the six relevant physical variables as multivari- 
ate normal - we do not make any explicit assumptions concern- 
ing the nature of the scatter in their observed distribution. In 
particular we do not attempt to separate intrinsic scatter and 
measurement error in their observed distribution. Undoubtedly 
some component of the variance in our fitted distance relations 
will be due to observational errors and related effects such as 
line of sight spread in the true distance of the calibrating stars 
- both effects which are, at least in principle, removable. Our 
point is that the relative contribution of intrinsic scatter and 
observational errors is unlikely to differ dramatically in each of 
our distance estimators. Hence our conclusions concerning the 
relative reduction in variance resulting from the use of magni- 
tudes and colours at maximum light wiU not be significantly 
changed - even if the absolute value of the variance could be 
reduced in both cases by the use of more precise observations 
(observations which - notwithstanding these remarks - are now 
available from HST and 8m class terrestrial telescopes, and 
from the use of observations at redder wavelengths). 

6. Results and Discussion 

In this section we present the results of fitting the coefficients 
of the relations defined in equations (34) - (39) by a multilin- 
ear regression model as described in the appendix. Each rela- 
tion has been fitted using the three LMC samples - of 31, 35 
and 39 stars respectively. The relations have been converted 
to absolute magnitudes at mean and maximum light assum- 
ing an LMC distance modulus of 18.5 mag for the LMC (c.f. 
Madorc and Freedman, 1991; Pierce et al., 1994). Tables 7 and 
8 present fitted PL and PLC relations respectively at mean and 
maximum light, and also a period, luminosity, semi-amplitude 
(FLA) relation - again fitted at both mean and maximum light. 
Table 9 presents the results of fitting a period, maximum lu- 
minosity, mean luminosity, maximum colour (PLL'C) and pe- 
riod, maximum luminosity, semi-amplitude, maximum colour 
(PLAC) relation. 

The table headings indicate the form of the fitted relation 
and the variables used in the fit. The next rows indicate the 
values of the regression coefficients obtained. The final two 
columns of each table give the variance of M« derived from the 
linear regression (where, as, before, a star indicates mean or 
maximum light as appropriate) and the percentage rms error 
of the corresponding distance estimator, calculated from equa- 
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tion (33) - with the exception of the PLL'C relation in table 
9, the distance error dispersion of which wo discuss separately. 
Next to the fitted regression coefficients for each relation we 
give their computed standard errors. For the coefficients of the 
relevant physical variables in each fit, we also indicate the com- 
puted probability of obtaining a sample regression coefficient 
larger in modulus than the estimated value, under the null hy- 
pothesis that the true regression coefficient is identically zero, 
applying the t test described in the appendix. This provides 
a clear and useful indication of the relative importance of the 
different independent variables in each relation. Finally, for 
the relations with two or more independent variables, we also 
present beneath each table the results of a second significance 
test involving the partial sample multiple correlation coefficient 
(SMCC) of each regression fit, as described in the appendix. 
This test provides a direct measure of the significance of in- 
cluding the final independent variable in reducing the overall 
dispersion of the fit. We list the partial SMCC for each fit and 
the probability that the value of the tost statistic W be greater 
than its observed value under the null hypothesis that the true 
partial SMCC be identically zero: i.e. that the final indepen- 
dent variable makes no contribution to the reduction of the 
scatter in the relation. 

6.1. PL relations 

Table 7 presents our results for equations (34) and (35), the PL 
relations at mean and maximum light. Note that there is hardly 
any difference in the variance of the fit, or the dispersion of 
the corresponding distance estimator, between mean and max- 
imum light for each of the three samples. As one would nat- 
urally expect, the regression coefficient of logP is in all cases 
highly significant: cepheid magnitudes at mean or maximum 
light are not well described by a constant. The coefl&cient of 
logP in each case varies from about -2.0 to -2.5, and is larger 
in modulus for the maximum light relation for each sample 
size, with comparable standard error - although the difference 
is always within 2 a. This range is somewhat different from 
existing PL relations for the LMC - for example -2.88 (MF), 
-2.69 (CC) and a range from -2.59 to -2.90 (MWF). Table 3 of 
MWF lists the data set used in their work: as we noted in Sect. 
5, our data set is a subset of this sample. The results of table 
7 indicate, therefore, that the coefficient of logP in a linear 
least squares fit to a subset of the MWF data set differs from 
the value obtained when the entire data set is used. This does 
reinforce the fact that PL relations are inherently statistical 
in nature and care must be taken in comparing the results of 
studios which sample different period ranges a point also re- 
cently addressed in Pierce et al., 1994. In the present context, 
this means that we can usefully compare the dispersion of dif- 
ferent distance relations obtained for the same sample - of 31, 
35 or 39 stars - but must be more cautious in comparing re- 
sults for a given relation across the three samples, particularly 
in the light of the normality test results described in Sect. 5. 

6.2. PLC and PL A relations 

Table 8 presents our results for PLC relations, based on equa^ 
tions (36) and (37), at mean and maximum light. We can see 
that there is an appreciable reduction in the dispersion of each 
relation, and for all three samples, compared with the corre- 
sponding PL relations. This conclusion is borne out in a quan- 
titative manner in several ways. First note that the t test ap- 



plied to the regression coefficients strongly rejects the null hy- 
pothesis of a zero regression coefficient for colour in all cases, 
although it is interesting to note that the significance of a non- 
zero regression coefficient of log P is still considerably higher 
than that for colour, suggesting that the greater contribution 
to the reduction of scatter is coming from the PL relation. 
Notwithstanding this, the results of the W test applied to the 
partial SMCCs for each relation do confirm that the addition 
of colour significantly reduces the scatter compared with the 
PL relation. In all cases the null hypothesis of no reduction 
in scatter is rejected strongly, with a probability of acceptance 
of 4.9 X 10~* or less. Hence, our application of this hypothesis 
test reconfirm the results of MWF and CC who established the 
existence of a colour term for LMC Ceplicids. The coefficients 
of log P and {B — y)„ean which we have obtained are in all 
cases within one standard error of the values given by MWF 
and CC. 

We can see from table 8 that the PLC relation at maximum 
light leads to a distance estimator with about 10% loss disper- 
sion than its counterpart at mean light for the three samples. 
This result is further supported by the fact the partial SMCC 
test more strongly rejects the null hypothesis for the PLC relar 
tion at max;imum light than at mean light. Applying an F test 
to determine the significance of this reduction in dispersion, 
we find a significance of 0.27, 0.24 and 0.34, for the samples of 
31, 35 and 39 stars respectively. Thus, our sample of calibrat- 
ing Cepheids is still too small to confirm at a high significance 
level the improvement of the PLC relation at maximum light. 

Figures 5 and 6 show plots of period against {B — V) colour 
at mean and maximum light respectively. From these plots 
one can conclude that there is only a very small reduction in 
the width of the instability strip at maximum light for our 
LMC sample: a fact which may well account for the inferred 
reduction in scatter of only 10% in the maximum light PLC 
relation. Moreover, it can be seen from figure 3 that the max- 
imum {B — V) colour is essentially independent of period for 
logP < 1.5 (c.f. SKM). Figure 7 of SKM, however, illustrates 
that the temperature range of Galactic Cepheids at maximum 
light is only about 600 K. This suggests that a PLC relation 
at maximum light constructed for Galactic Cepheids may lead 
to bigger reduction in scatter compared to that at mean light. 



LMC Cepheids data; 39 stars 
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Fig. 5. log period plotted against unreddened B — V colour at mean 
light, for the full sample of 39 LMC stars 
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LMC Cepheids doto (Mortin): 39 stars 



Fig. 6. log period plotted against unreddened B — V colour at max- 
imum light, for the full sample of 39 LMC stars 



Tabic 8 also presents the results of fitting PLA relations 
to tlie LMC data, wliicli represent a projection of tlie PLAC 
relation suggested by equation (38) and considered below. In 
all cases it was found that the addition of a semi-ampMtude 
term did not significantly reduce tlic dispersion of the rela- 
tion. The partial SMCC test accepted tlie null hypothesis at a 
level of more than 10% and the regression coefficient of semi- 
amplitude was not significantly different from zero in any case. 
What is interesting to note, however, is that the PLA relation 
appeared to fare best when applied to the full sample of 39 
stars in the sense that, in this case, the null hypothesis of the 
partial SMCC test is accepted least strongly. This is perhaps 
not too surprising since the 39 stax sample includes the group 
of four stars which, from figure 1, clearly have unusual am- 
plitudes - and one might expect that a relation involving an 
amplitude term would bo most readily able to accommodate 
such stars. It would be unwise to attach too much importance 
to this result, however, since we have shown in Sect. 5 that 
the semi-amplitudes satisfy the assumption of normality least 
accurately, which may in turn adversely affect the results of 
any hypothesis test based on this assumption. 



6.3. PLL'C and PLAC relations 

Table 9 presents the results of our fits to the PLL'C relation, in 
the modified form of equation (41), and PLAC relation given 
by equation (38). Firstly note that the PLAC relation does not 
offer a significant reduction in dispersion over the PLC relation 
at mean or maximum light in the samples of 31 or 35 stars, 
consistent with the results of the previous section. There is a 
significant improvement in the relation for the 39 star sample, 
however. The partial SMCC rmll hypothesis is rejected at the 
3% level and the regression coefficient of semi-amplitude is con- 
sistent with zero at a similarly small probability. This would 
again seem to be due to the inclusion of the group of stars of 
unusual amplitude in our sample. Notwithstanding our earlier 
note of caution about the consistency of the semi-amplitude 
distribution with a normal distribution, we nevertheless be- 
lieve this result to indicate that a semi-amplitude term has an 
important role to play in improving Cepheid PLC relations - 
not as a replacement for, but in addition to, a colour term - 



when stars of unusual amplitude for their period are consid- 
ered. 

The PLL'C relation, on the other hand, is seen to have a 
considerably reduced dispersion - by a factor of two or more 
- for all three samples. The partial SMCC null hypothesis is 
now very strongly rejected and the regression coefficient of 
Af„,oan— < M„,oan > dlffcrs from zero with a very high sig- 
nificance and small standard error. 

The outstanding difficulty with the PLL'C relation lies with 
how one can convert it into a useful distance estimator. The ba- 
sic problem is that absolute magnitude at mean and maximum 
light are related to their apparent magnitude couterparts via 
an identical function of distance, so that equation (39) may be 
distance-degenerate: i.e. it yields essentially no distance infor- 
mation if the coefficient of M„ean is close to unity. As we saw 
in Sect. 4, we have attempted to overcome this problem by 
introducing the sample averaged absolute magnitude at mean 
light in a group of distant Cepheids - effectively turning the 
A^mcan term in equation (39) into a quantity which is both dis- 
tance independent and directly observable - but at the cost 
of increasing the dispersion of our distance estimator due to 
the sampling variance of < M„,can >, and thus restricting the 
application of our PLL'C relation to a sufficiently large group 
of distant Cepheids. 

In figure 7 we plot the rms percentage error dispersion, A*, 
of the corresponding distance estimator, divided by the per- 
centage distance error of the PLC relation at maximum light, 
as a function of n, the number of distant Cepheids observed. 
We use the values of and derived from the LMC 

sample of 31 stars, and compute A* using equations (33) and 
(42). We can see from figure 8 that A* falls off quite slowly 
with n. Moreover, we see that for small samples of 10 or fewer 
Cepheids A* is greater than unity - i.e. the resultant distance 
error dispersion of the PLL'C relation is in fact larger than 
that for the PLC relation. This is because the reduction in the 
variance of M„ax is more than cancelled out by the large sam- 
ple variance of < M„,can >. For larger values of n, however, we 
do see an appreciable reduction in the dispersion of the dis- 
tance estimator, falling to ~ 63% for n = 30 and to ~ 53% for 
n = 40. It seems, therefore, that our modified PLL'C can sig- 
nificantly improve upon the PLC relation for fairly large but 
realistic sample sizes of distant Cepheids. Similar results are 
obtained for the samples of 35 and 39 Cepheids. 

Of course we should note here that we arc neglecting the 
effect of sampling variance on the determination of the distri- 
bution parameters in our LMC calibrating data: clearly with 
a calibrating sample of less than 40 stars this effect cannot be 
considered negligible. It will have no bearing upon our compar- 
ison of the relative dispersion of any of our estimators, however, 
since it will affect every relation in precisely the same way. 
Moreover, its effect on the calibrating sample is in principle 
removable as the number of calibrating Cepheids is increased. 

Finally, we observe from table 9 the important fact that 
the regression coefficient of log P for all three samples is sig- 
nificantly different from the value predicted from equation 
(21), ie. -3.33. In fact, if we force the coefficient of logP to 
be -3.33 in our regression fit, we find that the coefficient of 
Af„can— < A/„,oan > Is fittcd to bc esseutlally zero. In other 
words the fit reverts to the PLC relation at maximum light, 
with correspondingly increased dispersion. A similar sensitiv- 
ity is, in fact, found with the PLC relation at mean light, for 
which the theoretically expected coefficient of log P is about 
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2. 




number of Cephelds 



Fig. 7. rms percentage distance error, A*, derived from the PLL'C 
relation — divided by the percentage distance error of the PLC rela- 
tion at maximum light — as a function of n, the number of observed 
Cephelds. 



4. Forcing the coefficient of of log P to be equal to this value 
results in a significant change in the [B — V)„can coefficient. 

The change in the coefficient of log P in the PLL'C relation 
would appear to be due to the very strong correlation between 
absolute magnitude at mean and maximum light (which, as 
we have discussed, renders the relation nearly distance degen- 
erate). Thus, while the measured values of log P and maximum 
colour continue to provide useful physical information to con- 
strain the inferred absolute magnitude at maximum light, the 
main contribution to M„ax comes from the measured value of 
M„ean— < Mmcan > " and it Is thc telatlonship between these 
two variables which dominates the form of the fitted relation. 
In a sense, therefore, the numerical values of the coefficients in 
our PLL'C relation are determined as much by the fact that 
we are combining observations of Cepheids at two phase points 
as by the underlying physical relationship between period, lu- 
minosity and colour. 

7. Conclusions 

Using the period-mean density relation, the Stefan Boltzmann 
law and the existence of a linear meiss luminosity law for 
Cepheids, we have derived a new linear relation for Cepheids, 
connecting the maximum and mean luminosity, the period and 
effective temperature. This new equation suggests the possibil- 
ity of deriving PLC relations using observations of Cepheids at 
maximum light, and applying these relations to estimate the 
distance of Cepheids. We have fitted these new relations - to- 
gether with standard PL and PLC relations at mean light - to 
a sample of Cepheids in the LMC. We have adopted a general 
statistical model which allows for non-zero correlation between 
all pairs of the observables, and we have obtained distance es- 
timators which are 'optimal' in the sense of being unbiased and 
having minimum variance. 

More specifically, our conclusions are the foUowing:- 

1. Our PL and PLC results at mean light are similar to those 
given in MWF and CC. Moreover, we also confirm that the 
introduction of a colour term for LMC Cepheids produces 
a significant reduction in scatter over a PL relation. 



We have derived a PLC relation for LMC Cepheids at light 
maximum. The introduction of the colour term at max- 
imum light again offers a significant reduction in scatter 
compared with the corresponding PL(max) relation. More- 
over, for these data we find that our PLC(max) relation has 
around 10 percent less scatter than a PLC relation at mean 
light. 

The maximum light, maximum colour, period, semi- 
amplitude relation generally has a comparable dispersion 
to that obtained for PLC relations at mean or maximum 
light, but has a significantly smaller dispersion when stars 

with unusual amplitudes for their period are included in 
the sample. This result provides evidence to support the 
arguments leading to equation (15). 

The maximum light, maximum colour, period and mean 
magnitude relation given by equation (39) offers a highly 
significant reduction in the dispersion of PLC relations at 
maximum or mean light. In converting the relation into 
a form which is not close to distance degenerate, we find 
that the corresponding distance estimator has a dispersion 
nearly 40% smaller than that derived from a PLC relation 
at maximum light provided the relation is applied to a suf- 
ficiently large (n = 30) group of equidistant Cepheids in, 
e.g., a distant galaxy: an observational constraint which 
does not seem too unreasonable. This relation appears to 
derive both from the underlying physical relationship be- 
tween luminosity, colour and period, and from the fact that 
Cepheid observations at two different phase points are used 
in its construction. Our results would seem to offer support 
for the derivation leading to equation (14), although fur- 
ther analysis of other data sets would prove very useful in 
order to more fully understand this relation. 

8. Further Work 

We present below some topics which we feel deserve further 
investigation based on the results presented in this paper. 

1. MWF present individual reddening corrections for many of 
the stars in table 1. We propose to repeat the analysis using 
these reddenings instead of using E{B — V) = 0.1 for all 
stars. The use of accurate reddening corrections would be 
important in the practical implementation of this method 
for distance determinations. It would also be important to 
model the geometry of the LMC, as is carried out in CC. 

2. CC present data for Cepheids in the SMC. We have started 
to further test our results on SMC Cepheids and other data 
sets such as those described in Table (1) of Jacoby et. al 
(1992). 

3. We plan to investigate in more detail our 4 variable estima^ 
tors using maximum and mean light, period and maximum 
colour. We would welcome receiving new Cepheid data sets, 
such as those available from the MACHO project and Hub- 
ble Telescope observations, in order to further test and ex- 
tend the results described here. We also plan to investigate 
the construction of PLL'C estimators using apparent mag- 
nitude data at two arbitrary phase points, not just at mean 
and maximum light as studied in this paper. 

4. We have begun a reexamination of calibrating Galactic 
Cepheids, as presented in e.g. table 2 of Feast and Walker 
(1987). Although Fernie and McGonegal (1983) found that 
the introduction of a colour term did not significantly re- 
duce the scatter of the PL relation at mean light, it would 
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be interesting to apply the results of this paper to Galactic 
Ccphcids and establish if a significant reduction in disper- 
sion can be achieved in this case at maximum light. 

5. Successful completion of (1), (2), (3) and (4) will enable 
us to properly calibrate our new relations and compute a 
distance modulus to the LMC and SMC. 

6. Madorc and Frccdman (1991) show that the dispersion in 
Cepheid PL relations at mean light decreases as wavelength 
increases. In this work we have used B — V observations. 
We plan to extend this by using observations at redder 
wavelengths in the hope that this will further reduce the 
scatter and in addition serve to reduce the effects of red- 
dening and metallicity. Thus our method might serve to 
further improve the large body of work carried out in the 
infrared in the past decade, which has already offered a very 
sigificant improvement in Cepheid distance estimators. 

7. Given that our new estimators using maximum light hold 
up to the tests mentioned in (1) and (2) - and given that 
they depend upon the line of sight dispersion in the true 
distance of the observed Cepheids being small, it seems ap- 
propriate to apply our estimator to Cepheids in galaxies at 
cosmological distances. Prime candidates for such analysis 
would seem to be IC 4182 (Saha et. al, 1994) and MlOO 
(Frecdman ct al, 1994). It should be noted that the former 
authors adopted a Galactic metallicity for IC 4182. 

It might be argued that the extra work needed in obtaining 
light curves with sufficient phase coverage to define an accurate 
mciximum is not justified by the relatively small reduction in 
dispersion when using maximum light relations. At least in the 
case of the LMC, however, the Cepheid data already obtained 
from the MACHO (Alcock et. al 1994) project mean that such 
observations are already in place. 

Much of the above work is well under way and will be 
reported upon in subsequent papers. 
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10. Appendix: the statistical model 

The distance estimators introduced in Sect. 4 depend upon var- 
ious linear combinations of the following six observable quan- 
tities: maximum apparent magnitude, mean apparent mag- 
nitude, maximum colour, mean colour, log period and semi- 
amplitude. More specifically, the relations set down in equa- 
tions (34) - (41) involve using a linear combination of a subset 
of these observables to infer an estimate of the absolute magni- 
tude at mean or maximum light, which is then combined with 
the apparent magnitude to obtain a distance estimator follow- 
ing equation (24) or (25). In order to compare the variance 
of these distance estimators, we therefore require to adopt an 
appropriate statistical model for the joint distribution of the 
six intrinsic physical quantities: maximum absolute magnitude, 
mean absolute magnitude, maximum colour, mean colour, log 
period and semi-amplitude. (Of course, since the calibrating 
Cepheids in the LMC are assumed to be equidistant, any model 
for the distribution of mean and maximum absolute magni- 
tudes will apply equivalently to apparent magnitudes corrected 
for absorption). 

We adopt a model which is general, realistic and non- 
restrictive, while remaining analytically tractable: an obvious 
candidate is the multivariate normal distribution. Thus, we as- 
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sume that the above six physical quantities are drawn from a 
distribution denoted by, 



X~iV(M,E) 

with probability density function given by, 



(1) 



(2) 



where X is the vector of variables, is the vector of their 
mean values and E is the 6x6 covariance matrix describing 
their mutual correlation. E is assumed to be positive definite, 
real, symmetric but is otherwise arbitrary. In particular, we do 
not assume E to be diagonal: i.e. we allow for all six physical 
variables to be correlated with each other - as clearly may be 
the case in reality. 

Let the elements of X, M and E be denoted {Xi,X2, ...jXe), 
(/ii, /L(2, Ms) and atj respectively. The statistical problem 
with which we are dealing concerns the optimal estimation, 
or prediction, of Xj for some j (denoting the mean or max- 
imum absolute magnitude) from a linear combination of the 
measured values of some or all of the other variables, Xi. This 
is an example of a linear prediction problem - a topic which 
is treated extensively in the general statistics literature (c.f. 
Kendall and Stuart 1963; Graybill, 1976) 

Following the standard notation, we define the best predic- 
tion function gi{X2, Xe) of, say, Xi based upon the mea- 
sured values of X2, Xe to be the function such that, 

E{[X,-gi{X2,...,Xe)f} < E{[X, - g{X2, Xe)f} (3) 

for all other such functions, g(X2, ■■■Xe), of {X2, Xe) (c.f. 
eqn (12.2.2) of Graybill, 1976). That is, the best prediction 
function is 'best' in the sense of minimum variance and unbi- 
ased - i.e. minimum risk, as we define in Sect. 4. 

It can be shown (c.f. Graybill 1976; Kendall and Stuart 
1963) that when X has a multivariate normal distribution, the 
best linear prediction function of Xi based upon the measured 
values of X2, Xe is 



E{Xi\X2 = X2, ....Xe = xe) = /3i + ^/SfcXfc 



(4) 



where xj denotes the measured value of the variable Xj, and 
the regression coefficient, I3j, is defined by. 



cov[(Xi,Xj\X2,.... Xj-i,X,+i,..,Xe)] 
viii[(Xj\X2,...,X,-i,X,+i,...Xe)] 



(5) 



In other words, the best linear prediction function of Xi 
is given simply by the linear regression of Xi on X2,....Xe. 
Obviously the equivalent result holds for the best linear pre- 
dictor of the other Xj. Moreover, the result also holds in the 
case where the covariance matrix, E, is not known a priori but 
must be estimated from observations - as is obviously the case 
in the current study. In this event, the covariance matrix - and 
the regression coefficients in equation (A. 5) calculated from it 
- are simply replaced by their sample estimates. 

Our procedure for finding the optimum set of predicition 
variables for M„ax or M„ean is based upon the forward selection 
procedure, as described in e.g. Chapter 12 of Graybill (1976). 
Again, we outline this procedure where one is predicting Xi 



from measurements of the other variables, with the obvious 
equivalent procedure for predicting some other Xj . 

First, we determine the sample multiple correlation coef- 
ficient, pi(k), of Xi with Xk for k = 2,..., 6, and identify the 
largest of these in modulus. Suppose, without loss of general- 
ity, that this is Pi(2)- Then X2 is the best single predictor of 
Xi . We then compute all multiple correlation coefficients with 
Xi of pairs of variables which include the best single predictor, 
X2: i.e. pi(2.M), Pi(2,4), Pi(2,6)- Again, we select the largest - 
say Pi(2,s)- It then follows that X2 and Xg are the best two 
predictors of Xi which include the best single predictor X2 . 

We can continue this procedure until all the remaining vari- 
ables have been included in the prediction function or, more 
usefully, until the addition of a new variable does not apprecia- 
bly improve the estimator of Xi. We can test the significance 
of adding a given variable - and hence quantify what we mean 
by 'appreciably improve' the fit - in two different ways. First, 
we can test the null hypothesis that the regression coefficient 
of a given variable, Xj, is equal to zero, i.e.. 



Ho : = 



(6) 



If the null hypothesis is strongly rejected, then one should 
include Xj in the best linear prediction function. If the null hy- 
pothesis is accepted, then Xj has little effect on the prediction 
of Xi and could be considered superfluous. It can be shown 
(c.f. Graybill, 1976) that under the null hypothesis the trans- 
formed variable tj = 0j/o{$j) has a t distribution with n — a 
degrees of freedom. Here a denotes the standard error of the 
estimated regression coefficient and n and a are the number 
of data points in the linear regression fit and the number of 
independent variables in the fit (i.e. the number of prediction 
variables) respectively. Testing the null hypothesis that fij = 
is therefore equivalent to the test. 



Ho:tj=0 



(7) 



the results of which for each of our estimators are described in 
detail in Sect. 5. 

We can also test the significance of adding a given vari- 
able in terms of the computed sample multiple correlation co- 
efficients of the different relations. Suppose we have applied 
the forward selection procedure described above to identify the 
best linear prediction function of Xi as depending on the vari- 
ables X2, Xq, and we wish to test whether the new variable, 
Xq+i significantly improves the estimation of Xi. If Xg+i has 
no effect on the prediction then the multiple correlation coef- 
ficient of Xi with X2, ...,Xq will be identically equal to that of 
Xi with X2,...,Xq+i, i.e., 



2 _ 2 

Pl(2,...q + 1) - Pl(2,...q) 



(8) 



One can then use the sampled values of the multiple corre- 
lation coefficients to construct a suitable hypothesis test based 
upon this property, i.e. to test the null hypothesis. 



H 



■ Pl(2,...q+l) — Pl(2,...g) 



(9) 



where the caret denotes that the multiple correlation coeffi- 
cients are sample values estimated from the calibration data. 

In fact, the test in equation (A.9) is equivalent to testing 
the null hypothesis. 



Ho ■ Pl,q+l\(2,...g) — 



(10) 
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where pi,q+i|(2,...q) is the partial sample multiple correlation 
coefEciont of Xi and Xq+i given X2, ...Xq. 

When Hq is true, equation (A. 10) is distributed as a central 
F random variable with 1 and n — q — 2 degrees of freedom. 
It is this hypothesis test which we have applied to each of the 
relations introduced in Sect. 4. 

Since the assumption of a multivariate normal distribution 
is required in most of the above, we have devoted Sect. 6 to 
the task of testing carefully how well our LMC data satisfy this 
normality assumption. 
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TABLE 1 



LMC STARS 



Star 


Period (days) 


HV2353 


3.1080 


HV12765 


3.4290 


HV12700 


8.1530 


HV12823 


8.3020 


HV2854 


8.6350 


HV2733 


8.7220 


HV12816 


9.1140 


HV971 


9.2970 


HV2301 


9.4990 


HV6105 


10.4400 


HV2864 


10.9840 


HV874 


12.6820 


HV2260 


12.9360 


HV2527 


12.9480 


HV997 


13.1470 


HV2579 


13.4310 


HV2352 


13.6260 


HV955 


13.7320 


HV2324 


14.4660 


HV2549 


16.1970 


HV2580 


16.9450 


HV2836 


17.5260 


HV1005 


18.7100 


HV2793 


19.1840 


HV1013 


24.1264 


HV12815 


26.1690 


HV1023 


26.5880 


HV1002 


30.4700 


HV899 


31.0270 


HV2294 


36.5270 


HV2294 


36.5270 


HV879 


36.7820 


HV2338 


42.1669 


HV877 


45.1853 


HV2369 


48.3190 


HV2827 


78.8582 


HV5497 


98.7802 


HV2883 


109.000 


HV2447 


119.4400 


HV883 


134.000 
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TABLE 2 

EXCLUDED LMC STARS 

Star Period (days) 

HV5655 14.2110 

HV2262 15.8460 

HV909 37.5700 

HV2257 42.1669 

HV900 47.5330 

HV953 47.890 



TABLE 3 

UNUSUAL LMC STARS 

Star Period (days) 

HV877 45.1853 

HV2827 78.8582 

HV5497 98.7802 

HV2883 109.000 

HV2447 119.440 

HV883 134.000 



TABLE 4 

RESULTS OF KS TEST APPLIED TO LMC DATA 



Variable 


fobs 


Prob(£)n > Dob.) 




0.099 


0.905 


Vmean 


0.103 


0.880 


{B - V)m.^ 


0.096 


0.927 


{B — V^)mean 


0.099 


0.907 


logP 


0.125 


0.688 


semi-amplitude 


0.144 


0.504 


Vmax 


0.100 


0.858 


Vmean 


0.120 


0.663 


{B - y)^ax 


0.120 


0.663 


{B — y)mean 


0.098 


0.875 


logP 


0.121 


0.651 


semi-amplitude 


0.151 


0.372 


Vmax 


0.111 


0.697 


Vmcan 


0.152 


0.304 


(S - \/)max 


0.172 


0.179 


(^B V^)mcan 


0.100 


0.798 


logP 


0.124 


0.562 


semi-amplitude 


0.150 


0.315 
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TABLE 5 

SKEWNESS AND KURTOSIS RESULTS 

Variable Skewness (no. of cr) Kurtosis (no. of a) 



31 stars Vmax 0.56 0.04 

Kiean 0.72 0.01 

{B - 0.57 0.38 

{B - T/)„ean 0.25 0.42 

logP 0.72 0.29 

semi-amplitude 0.52 0.58 

35 stars Vmax 0.77 0.18 

Knean 0.98 0.11 

{B - y)^ax 1.51 0.63 
(B-V) mean 

0.32 0.43 

logP 0.46 0.51 

semi-amplitude 0.61 0.60 

39 stars l/max 0.60 0.56 

Vmc^n 1.02 0.40 

(S - \/)n,ax 1.68 0.10 

(B-V) mean 0.15 0.65 

logP 0.66 0.09 

semi-amplitude 0.46 0.74 
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TABLE 6 



SHAPIRO AND WILKS NORMALITY TEST RESULTS 



31 stars 



35 stars 



39 stars 



Variable 




Significance 




0.977 


0.7546 


Vmean 


0.961 


0.3630 


{B - V)m.^ 


0.954 


0.2432 


{B — V^)mean 


0.959 


0.3117 


logP 


0.944 


0.1281 


semi-amplitude 


0.927 


0.0431 


Vmax 


0.960 


0.3010 


Vmean 


0.948 


0.1276 


(s - y)^ax 


0.928 


0.0301 


{B — V)inea.n 


0.972 


0.5873 


logP 


0.965 


0.3850 


semi-amplitude 


0.927 


0.0283 


Vmax 


0.940 


0.0508 


Vmcan 


0.919 


0.0090 


{B - y)max 


0.865 


0.0001 


(^B V^)mcan 


0.953 


0.1469 


logP 


0.948 


0.0972 


semi-amplitude 


0.928 


0.0197 
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TABLE 7 

FITTED PL RELATIONS 

31 stars 

M = o + 61ogP 

/3 ^3 logP(/3 = 0) a^^ A 

a -1.90 0.27 0.324 14.9 

b -2.05 0.23 -9.36 



Mmax = a + 61ogP 

P logP(/3 = 0) A 

a -1.88 0.26 0.316 14.5 

b -2.48 0.22 < -10 



35 stars 

M = a + 61ogP 

/3 logP(/3 = 0) A 

a -1.80 0.21 0.323 14.9 

b -2.16 0.17 < -10 



Mmax = a + 61ogP 

logP(/3 = 0) ^ 

a -1.93 0.22 0.326 15.0 

b -2.44 0.17 < -10 



39 stars 

M = o-|-&logP 

P logP(/3 = 0) ^M. A 

a -1.64 0.18 0.317 14.6 

b -2.30 0.13 < -10 



Mmax = a-|-61ogP 

logP(/3 = 0) ^M. A 

a -2.04 0.18 0.316 14.5 

b -2.34 0.13 < -10 
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TABLE 8 

FITTED PLC AND PLA RELATIONS 

31 stars 

M = a + blogP + c{B -V) 







1(1.) /'(' i 


— (11 






a -2.11 
b -3.03 
c 1.75 


0.23 
0.31 
0.44 


< -10 
-3.61 




0.265 


12.2 


partial SMCC = 


0.597 


P{Ho) 


= 4.9 X 10"" 






Mmax = a -1- 6 log 


■.P + c{B- 


V'^) max 












log P{,3 


= 0) 




A 


a -2.39 
b -2.87 
c 2.05 


0.22 
0.18 
0.41 


< -10 
-4.80 




0.235 


10.8 


partial SMCC = 


0.683 


P{Ho) 


= 3.2 X 10"® 







35 stars 

M = a + b\ogP + c{B -V) 

P logP(/3 = 0) A 

a -2.09 0.18 0.257 11.8 

b -3.11 0.25 < -10 

c L83 041 -4.37 

partial SMCC = 0.622 P(/fo) = 8.6 x 10"® 

Mn,ax = a + h\ogP + c{B - V) max 

a -2.32 0.16 0.227 10.5 

b -2.92 0.14 < -10 

c 2.00 0.33 -6.25 



partial SMCC = 0.727 



P{Ho) = 1.1 X 10"' 
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TABLE 8 continued 



39 stars 

M = a + h\ogP + c{B - V) 



a 
P 




log j-^^p 


— 

— ^) 




A 


a -2.01 
b -3.22 

C i.OO 


0.16 
0.22 
n QQ 


< -10 

A QQ 
-4.00 




0.251 


11.6 


partial oiviL^Ly — 


U.DZO 




— o.u X lu 






Mmax = O -f 6l0g 


;P + c(B 












^3 


logP(/3 


= 0) 




A 


a -2.11 
b -2.96 

c 1.64 


0.13 
0.15 
0.29 


< -10 
-5.92 




0.234 


10.8 


partial SMCC = 


0.682 




= 2.4 X 10"® 







31 stars 

M = a + h\ogP + c{Mmea.^ - Mmax) 







logP(/3 = 


0) 




A 


a -1.88 
b -2.40 
c -0.19 


0.27 
0.36 
0.65 


-6.77 
-0.41 




0.321 


14.8 


partial SMCC 


= 0.230 


P{Ho) = 


0.222 






Mmax = a -1- 6 log P -h c(Mmean 












^0 


logP(/3 = 


0) 




A 


a -1.88 
b -2.40 
c 0.81 


0.27 
0.36 
0.65 


-6.77 
-0.96 




0.321 


14.8 


partial SMCC 


= -0.054 


P(ffo) = 


= 0.776 
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TABLE 8 continued 



35 stars 



M = a + 6l0gP + c(Mmean - 














logP(/3 


= 0) 




A 


a -1.85 
b -2.28 
c 0.44 


0.23 
0.22 
0.51 


< -10 
-0.70 




0.325 


15.0 


partial SMCC = 


0.150 




= 0.4 X 10"^ 






Mmax = a + 61og 










/3 




logP(/3 


= 0) 




A 


a -1.85 

b -2.28 
c -0.56 


0.23 
0.22 
0.51 


< -10 
-0.85 




0.325 


15.0 


partial SMCC = 


-0.190 




= 0.283 






39 stars 












M = a + blogP + c(Mmean - 


Mmax) 








/3 




logP(/3 


= 0) 




A 


a -1.84 
b -2.32 
c 0.51 


0.21 
0.13 
0.32 


< -10 

-1.21 




0.310 


14.3 


partial SMCC = 


0.254 




= 0.124 






Mmax = a -1- 6 log 


: P + c(M^,^ 


-^-^mcan) 












logP(/3 


= 0) 




A 


a -1.84 
b -2.32 
c -0.49 


0.21 
0.13 
0.32 


< -10 
-1.52 




0.310 


14.3 



partial SMCC = -0.245 



P{Ho) = 0.138 
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TABLE 9 



FITTED PLL'C AND PLAC RELATIONS 

31 stars 

Mmax =a + bl0gP + c{B-V)+ d(Mmean- < M^ean >) 

^ ioKl'jl = 0) cJm, 

a -3.95 0.89 0.078 

b -0.89 0.14 -6.19 

c 0.57 0.17 -2.94 

d 0.83 0.05 < -10 



partial SMCC = 0.946 P(i?o) = 0.9 x 10"^^ 

Mmax =a + blogP + c{B- F)max + d(Mmean - M^ax) 

logP(/3 = 0) A 

a -2.41 0.22 0.234 10.7 

b -3.14 0.30 < -10 

c 2.19 0.43 -4.92 

d 0.56 0.50 -0.87 



partial SMCC = 0.213 P{Ho) = 0.267 



35 stars 

Mmax = a+b\ogP + C{B - V) + d(Mmean- < Mmean >) 



13 


'^0 


logP(/3 = 0) 






a -4.13 
b -0.92 

c 0.76 
d 0.79 


0.90 
0.15 

0.15 
0.05 


-6.40 

-5.03 

< -10 


0.083 




partial SMCC 


= 0.932 


P(//o) = 3.1 X 10"'^ 






Mmax = a + blogP + c{B 


- ■l^)max -1- d(Mmean - Mmax) 

logP(/3 = 0) 




A 


a -2.43 
b -3.12 
c 2.23 
d 0.53 


0.18 
0.21 
0.37 
0.40 


< -10 
-6.22 
-1.01 


0.224 


10.33 



partial SMCC = 0.232 



P(Ho) = 0.193 
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TABLE 9 continued 



39 stars 



Mmax 


= a + 6 log 
$ 


',P + c{B- 


V) + d(Mmoan- < Mmcan >) 

logP(/3 = 0) a^^ 


a 


-4.22 


0.85 


0.090 


b 


-1.07 


0.14 


-8.26 


c 


1.02 


0.12 


-9.47 


d 


0.72 


0.05 


< -10 



partial SMCC = 0.925 P(/fo) = 3.0 x 10" 



Mmax =a + h\0gP + c{B- F)max + d{Mmea.n - M^ax) 





'^0 


logP(/3 


= 0) Tm. a 


a -2.40 


0.18 




0.22 10.2 


b -3.19 


0.17 


< -10 




c 2.16 


0.36 


-6.33 




d 0.68 


0.30 


-1.79 




partial SMCC 


= 0.352 


P{Ho) 


= 3.2 X 10"^ 
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